“Happiness is the only thing that humans desire for its own sake.” said by the great philosopher Aristotle in the Nicomanchean Ethics. I feel happy because I saw a beatiful cat on the street, or I ate a delicious pumpkin pie made by my friend Jessica, or maybe I just stayed at home for a whole day relaxing and doing nothing, and it made me happy. The reasons for happiness can be vairous, but what exactly can make people happy? When people say they’re happy, what kind of emotions do they actually feel? Does the cause of happiness differ for different kinds of people? Let’s find out the answers by analyzing the HappyDB dataset, where 100,000 happy moments are recorded!
First, let’s check the Happy Moments Decriptions and see how our happy moments differ from the routine moments by analyzing the complexity of sentences and the emotional states .
With the 100,392 records of happy moments, we extract 139,760 sentences. On the one hand, each record will have 1.39 sentences and the records with only one sentence account for 83% of the total, which means that in most cases, happy moments are not long-lasting stories or hard to explain. Further, when we look into the sentences, we notice that only 16% of sentences have more than 20 words, which are considered as long sentences. Also compared to the length of sentence in general (10-20 words), most of our happy moments sentences have 5-15 words, which also shows that happy moments decriptions are shorter than general.
Meanwhile, on the otherhand, when we look into the number of verbs in each sentence, we can find that 42% of sentences have 3 verbs of more. The number of verbs can be considered as one proxy for the complexity of sentences in HappyDB and the high proportion of multiple verbs in one sentence indicates that people express quite complex thoughts in their “short” moment.
Let’s take a look on the VAD (also called PAD) socres of the happy moments. VAD model stands for Valence-Arousal-Dominance model (“P” in PAD stands for Pleasure), which provides a score for each lemmatized word on a scale of pleasure-displeasure (valence), excitement-calm (arousal), and control-inhibition (dominance). After computing the mean of each VAD score in the Happy Moments , and comparing this score to the scores of the ten sections of Gardian corpus. We can see that HappyDB’s VAD score is closer to VAD score of the travel section (\(V\approx6.2, A\approx4.0, D\approx5.7\)). In addition, every score in of Valence, Arousal and Dominance of HappyDB is larger than the corresponding highest socre of the ten sections, which is a quantative proof that when people express their happiness, they are highly pleased, excited and also self-dominant.
As mentioned above, happiness moment records do show a specific pattern and emotional state. Now, let’s go deeper into it, to discuss what elements lead people to this highly emotional happiness and try to figure out different patterns of happiness for different kinds of people.
Let’s first have a look on the distribution of our data.
We can see that in the HappyDB data, we have more male data(57%) than female(43%), more nonparent data (61%) than parent data(39%), more single data(54%) than married data(41%). The workers are mainly between 20 years old to 40 years old. Speaking to the distribution of countries, USA contributes to 79% of the data, the next is India(17%), it seems that it lacks data from Russia and China, which contribute to a large proportion of population.
Now we have a general idea of what do our data come from. Let’s try to figure out what contribute to happiness!
As we would like to identify interesting words for each inaugural speech, we use TF-IDF to weigh each term within each speech. It highlights terms that are more specific for a particular speech. Then we plot the wordcloud for the whole HappyDB data.
As we can see, “friend” is the most frequently used word in Happy Moments and the words regarding prople, such as “family”, “son”, “daughter” are also a great focus. Verbs like “feel”, “played”, “watched”, “enjoyed”, “bouthgt” and “received” will be another category of words that appears a lot in HappyDB.
So, according to the wordcloud, we are happy mainly because of the people around us and the cheerful actions in our past 24 hours or 3 months. Naturally, we also want to know if the word focus of these 2 catrgories (24 hours relflection period and 3 months) differ from each other. Let’s compare the wordcloud for 24 hours(24h) reflection period (left), and for 3 months(3m) reflection period (right).
For both of these 2 wordclouds, it follows the same pattern with the wordcloud in general. However, we can see that, for 24h reflection period, the verbs like “watched”, “played”, “enjoyed” appear much more often than for 3m reflection period, whereas the words like “job”, “school”, “life” appear more for 3m reflection period. Further, the 24m reflection period focus more on “dinner”, “lunch”, food“,”delicious" and 3m reflection period focus more on “birthday” or “event”.
It seems that for 24h reflection period, people tend to focus more on instant happiness like “played a game” or “had a delicious food”. Regarding the 3m reflection period, people tend to focus on long-term happiness such as “job”, “school” or special ecents, such as “birthday”. This makes sense, because if the reflection period is only 24hours, people will more often state out the daily happy moments. We could also deduce that if one is asked to state out his/her happy moments in his/her whole life, the possible answer will be something more infrequnetly happens, such as marriage, birth of child, first job…
HappyDB provides us the happy moments with a predicted category which is one of achievement, affection, bonding, enjoy_the_moment, exercise, leisure and nature.
“Affection” and “achievement” accounts for 67.8% of the whole happiness categories, which shows that our happiness come mainly from the interaction with loved ones and success in jobs, schools or some thing with great effort.
Let’s see how this category distribution varies from different age groups.
As the plot shows, people aged 27-37 have a greater focus on affection compared to people aged 17-27 which in turn have a greater focus on achievement. It might because at the age of 17-27, people are younger thus more willing and ambitious to prove themselves, leading to a higher emphasis on achievement, whereas people aged 27-37 would probably be already settled and more likely to be married thus put more emphasis on affection.
Let’s compare the category of happiness in other groups.
The happiness of married people come from affection more than achievement. Although we have more single data than married data as mentioned before, within the happy moments classified as afection, the proportion of married people exceed the proportion of single people. For single people, compared to the married people, they will put more emphasis on exercise, bongding and achievement.
We can conduct the same analysis on gender and parenthood. The mosaic plot shows that female puts more emphasis on affection and nature than male and parents put more emphasis on affection, nature and enjoy_the_moment than nonparents.
This is quite straightforward and we can expect these results. Because we expect women to be more emotional, and married people and parents to be more dedicated to their family, thus all of them would probably weigh affections more than achievement and their happiness might come from affections more.
Our happiness can be classified into different categories. But what happiness stands for? When people say they are happy, what do they actually feel? Satisfaction? Joy? Luck? To figure out this problem, we apply sentiment analysis using NRC sentiment lexion, which is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive).
The darker the orange is, the higher the score. Not surprisingly, HappyDB has a very high score in positive sentiments. Within the emotions, HAppyDB has higher scores in joy, anticipation ond trust. It is interesting that it does not score very high in “surprise”, but score very high in “trust”, because in my opinion, trust is rather a positive emotion that has less connection with happiness. This interesting fingding shows that people treat happiness more seriuosly: a single surprise would probably not be a source of happiness but a trust from deep heart will account for a freat part.
Let’s see the positive socres of happy moments in different countries.
## Save img as: /var/folders/kn/0tkhzxc93mg4hd6l77pk9fqh0000gn/T//RtmprD5cuL/ID_20190206172804_16361.html
Positive Scores in different countries
The darker the blue is, the higher the positive score.
First we can see on the plot that a lot of data are missing, for example Russia, China, and a large part of African coutries, which cause a big impede in the analysis of HappyDB. In the future, if possible, we should collect more data from these countries. If we want to focus only on the analysis of HappyDB in USA, in the future investigation, we should also collect the state information for each worker.
Based on the data we have, we can see Algeria, Egypt and Ethiopia has a very high score of positive sentiment and they are all located in Africa. But the country with lower positive scores such as Uganda and Zambia are also in Africa, which means Africa’s Happy Moments have a greater variance on positive sentiments. Notice the coutry with the lowest positive score in happiness is Iraq.
Let’s go deeper in difference of happiness in different countries. We use the topic modelling to assign one of nine topics to each Happy Moments and obtain the topic score (probability) and try to cluster the countries by sentiment scores and topic scores.
In the official HappyDB dataset, we are given a topic_dict file which gives us nine topics and their key words. So I try to use LDA to detect nine topics from HappyDB and assgin the topics according to topic_dict.
Below are the top 20 terms in each topic.
After checking the top 20 terms and the nine topics in topic_dict file, I realize it is unreasonable to follow these nine topics strictly, since there are predicted topics that do not have any keywords in these nine topics, so I adjust some new topics by myself and the final nine topics are “love”,“school_job”,“people”,“food”,“family”,“shopping”,“vacation”,“entertainment” and “celebration”.
Below are the topics with their numbers in HappyDB. “School_Job” and “Love” are the 2 main topics of HappyDB, which can be viewed as a counterpart of “Achievement” and “Affection” in the predicted categories.
Based on the average topic scores for each country, I assgined a topic to a certain coutry, to see that it focus most about happiness, for example, people in United States focus more on “people”.
Now we have the sentiment scores and topic scores of different countries, let’s try to cluster the countries into 3 groups! The countries considered in this part are the countries whose number of records are greater than 50 in HappyDB, otherwise, it will not be that informative.
From the above plot, we can see that Egypt are quite deviated from the overall pattern of the happy moments and Jamaica also deviates a lot but it is still considered as part of cluster 1. Egypt’s main happy topic is about “family” and it has a relatively high score in sentiment analysis.
In cluster1, we also have Thailand, Dominicia, Bulgaria, Macedonia, Serbia and Portugal. Macedonia, Bulgaria and Serbia are closer to each other. So we might assume that the happy moments depends on or say can reflect the location of a country.
The other cluster is a big one, including United States, Mexico, Canada, Australia, Turkey, France, Brazil, Canada, India, Viet Nam, Phillipines. These countries happiness topics focus are mainly on “people” and “shopping”, and their positive scores in sentiment analysis is higher than the cluster1.
By analyzing the HappyDB, we can see that people are highly emotional when talking about happy moments. We also deduct the reasons for happiness:
Happiness come maily from affections (with friends, family, lover) and achievements (job, school)
For 24h reflection period, people tend to focus more on instant happiness and daily happy moments like “played a game” or “had a delicious food”, whereas for the 3m reflection period, people tend to focus on long-term happiness such as “life”, “job”, “school”
Women, parents, married people and people aged 27-37 seem be happy from affections more, whereas male, nonpartents and single people seem be happy from achievements more.
Happiness stands for joy, anticipation and trust instead of surprise.
Different countries show different patterns about happiness, by clustering method, we can see Egypt has a specific pattern other than any other countries.